Skip to content

Bufr process#177

Draft
yuvraajnarula wants to merge 14 commits intoopenclimatefix:mainfrom
yuvraajnarula:bufr_process
Draft

Bufr process#177
yuvraajnarula wants to merge 14 commits intoopenclimatefix:mainfrom
yuvraajnarula:bufr_process

Conversation

@yuvraajnarula
Copy link
Copy Markdown
Contributor

Pull Request

Description

This PR adds a new BUFR processor module to enable reading and decoding NOMADS BUFR files into the NNJA-AI-compatible Parquet format.

The processor is designed to:

Decode BUFR messages using ecCodes (via the Python bindings)
Convert decoded data to the NNJA-AI archive schema, enabling seamless integration with existing workflows
Support decoding of initial high-priority observation types:
ADPUPA (upper-air soundings)
CrIS and IASI hyperspectral soundings
Serve as a modular component, so it can later be split into a dedicated repo for broader operational use.
Fixes #170

How Has This Been Tested?

Added pytest folder test which:

  • Reads a BUFR file from the NNJA archive

  • Converts it to Parquet

  • Compares the output with a reference NNJA-AI Parquet file

  • Passes if schemas and values match exactly

  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@yuvraajnarula
Copy link
Copy Markdown
Contributor Author

@jacobbieker, I tried finding some files for this, but they were heavy for testing. Would you mind if I showed you a small batch of those files as output, so that I could know if this matches your vision?

Copy link
Copy Markdown
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be some more changes for this, but thanks for the nice first step on this.

As for the test files, I think a good compromise would be to have some integration tests that pull real, historical NNJA BUFR files, the corresponding NNJA-AI representation, does the processing and confirms they match everywhere. This can be marked with pytest.mark.skip to skip in GitHub CI, but then I can run it locally and see if they match exactly.

For this, I would also probably cut down the PR to a single one, just the ADPUPA for now. It makes it simpler to review and make sure the setup works correctly with the real files.


source_name = "ADPUPA"

def _build_mappings(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the only data in the NNJA-AI ADPUPA? I thought there were more variables

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrote the basic ones and actually wanted to ask you that while I was leaning towards the primary descriptors when I loaded conv-adpupa-NC002001 for reference.

source_name = "CrIS"

def _build_mappings(self):
self.field_mappings = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely have more I think?

Comment thread graph_weather/data/bufr_process.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being moved and renamed? Seems unnecessary

Comment thread graph_weather/data/weather_station_reader.py
Comment thread tests/bufr_process/conftest.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUFR processor to read NOMADS BUFR files into NNJA-AI format

2 participants